In below task consumer behavior is analysed w.r.t to their spending habit with the use of card. For this analysis three main datasets selected Ireland card spending which also include spending levels in different county, UK monthly spending report by Bank of England and lastly spending via Revolut report released by Bank of England.
Datasets Description
Data are collected from payment service providers (PSPs) and payment system operators (PSOs) resident in the Republic of Ireland. PSPs include banks, credit unions, payment institutions and e-money institutions, while PSOs manage the systems that facilitate funds transfers between PSPs. The statistics can be used to identify trends in payments and are essential in helping policymakers take well-informed decisions, as well as identifying and monitoring developments in the payments markets within the EU, and for assisting in the promotion of the smooth operation of payment systems. Data contains number as well as value of transactions through various medium including debit cards.(“Payment Statistics ICentral Bank of Ireland | Central Bank of Ireland — Centralbank.ie”)
First we read csv files in R and then we had joined the UK monthly data with Revolut UK monthly data to analyse how consumer spend on revolut and won which category they spend most via card. Both csv files inner joined by taking “Month” column as common. category name_r denots category of Revolut dataset.
Code
#Read CSV file and assigning variable to datasetukdata <-read.csv("ukdatamonthly.csv",header =TRUE) #UK Dataiedata <-read.csv("Valuespentcatwise_ie.csv",header =TRUE) #Ireland data Monthlyrevolut<-read.csv("revolut.csv",header =TRUE) #Revolut data for UK Monthlyukdata1 <-na.omit(ukdata) #Omit any NA value in data#Copy of ukdata1ukdata_join<-ukdata1#Converting Month column of both data in same formatukdata_join$Month <-dmy(paste0("01-", ukdata_join$Month))revolut$Month_r <-dmy(paste0("01 ", revolut$Month_r))ukdata_join$Month <-format(ukdata_join$Month, "%Y-%m")revolut$Month_r <-format(revolut$Month_r, "%Y-%m")library(dplyr)# Inner join on "Month" columnjoined_data <-inner_join(ukdata_join, revolut, by =c("Month"="Month_r"))head(joined_data, 10) #To view datset
Spending on Fuel with Revolut, Staple spending by bank debit card and aggregate are plotted. The mentioned column of datasets are plotted with time on X-axis and column value (Index value w.r.t Feb 2020, 100) on Y-axis. Interactive scatter graph plotted by use of plotly package. Graph shows that percentage increase as comparison to Feb’20 value.
Output graph shows consumers spend more on fuel with Revolut card as compasrison to bank debit card on staple items.
Code
# Creating dataframe of column to plotdf <-data.frame(x = joined_data$Month, #X-axis monthy = joined_data$Fuel_r, # Revolut Fuel Cola = joined_data$Aggregate, #Aggregate cols = joined_data$Staple) # Staple column# Creating steps for slider (with color-trace visibility correspondence)steps <-list(list(args =list("visible", c(TRUE, FALSE, FALSE)), # Show Fuel_r (green)label ="Green",method ="restyle",value ="1" ),list(args =list("visible", c(FALSE, TRUE, FALSE)), # Show Aggregate (blue)label ="Blue",method ="restyle",value ="2" ),list(args =list("visible", c(FALSE, FALSE, TRUE)), # Show Staple (red)label ="Red",method ="restyle",value ="3" ))# Create the plot with initial visibility set for Fuel_rfig <-plot_ly(df,x =~x, y =~y,mode ="markers",marker =list(size =10, color ='green'), # Initial green colortype ="scatter",name ="Fuel_r") %>%# Set name for first trace (Fuel_r)add_trace(x =~x, y =~a,mode ="markers",marker =list(size =10, color ='blue'), # Initial blue color (hidden)type ="scatter",name ="Aggregate", # Set name for second trace (Aggregate)visible =FALSE) %>%add_trace(x =~x, y =~s,mode ="markers",marker =list(size =10, color ='red'), # Initial red color (hidden)type ="scatter",name ="Staple", # Set name for third trace (Staple)visible =FALSE) # Initially hidden# Customize layout with slider and legendfig <- fig %>%layout(title ="Spending on Fuel by revolut, Aggregate & Staple",sliders =list(list(active =1, # Initially show Fuel_r (green)currentvalue =list(prefix ="Color: "),pad =list(t =60),steps = steps ) ),legend =list(title ="Data") # Add legend title )fig
Mermaid Dataflow diagram of Joined dataset.
To join both dataset inner join is used. Month column and Month_r of ‘Revolut’ dataset column are kept as common and resulting in Month column and all other column of both datasets.
graph TD;
A(Ukdata) --> B{na.omit}
B -->|Output| C(Ukdata1)
C --> D(Ukdata_join)
D --> E{dmy}
E -->|Output| F(Ukdata_join)
G(Revolut)
G --> H{dmy}
H -->|Output| I(Revolut)
I --> J{format}
J -->|Output| K(Revolut)
F --> L{format}
L -->|Output| M(Ukdata_join)
N --> O(Joined_data)
K -->|Joining using 'Month_r'| N{Inner Join}
M -->|Joining using 'Month'| N{Inner Join}
Text analysis of .txt file.
The biggest GTC conference yet, NVIDIA founder and CEO Jensen Huang unveils NVIDIA Blackwell, NIM micro services, Omni verse Cloud APIs and more.Generative AI promises to revolutionize every industry it touches — all that’s been needed is the technology to meet the challenge.
To analyse the text of NIVIDIA CEO speech we copied the speech text in text file and analyse the frequency of words used in speech. This text analysis utilized to create the summary or emotion of speech by getting the most used words.
suppressWarnings(text <-readLines("AI.txt"))text_df <-tibble(text = text) %>%unnest_tokens(word, text)word_counts <- text_df %>%count(word, sort =TRUE)top_words <-head(word_counts, 10) # Get the top 10 most frequent wordsggplot(top_words, aes(x =reorder(word, n), y = n)) +geom_bar(stat ="identity") +labs(x ="Word", y ="Frequency") +theme(axis.text.x =element_text(angle =45, hjust =1)) # Rotate x-axis labels for better readability
Text Analysis of .pdf article on debit card usage in US.
Debit card use at the point of sale has grown dramatically in recent years in the United States and now exceeds the number of credit card transactions. However, many questions remain regarding patterns of debit card use, consumer preferences when using debit, and how consumers might respond to explicit pricing of card transactions. Using a new nationally representative consumer survey, this paper describes the current use of debit cards by U.S. consumers, including how demographics affect use
# Read PDF filepdf_text <-pdf_text("dcuse.pdf")# Create a Corpuscorpus <-Corpus(VectorSource(pdf_text))# Preprocess the text by removing numbers, punctuation and spacesuppressWarnings(corpus <-tm_map(corpus, content_transformer(tolower)))suppressWarnings(corpus <-tm_map(corpus, removePunctuation))suppressWarnings(corpus <-tm_map(corpus, removeNumbers))suppressWarnings(corpus <-tm_map(corpus, removeWords, stopwords("en")))suppressWarnings(corpus <-tm_map(corpus, stripWhitespace))# Create a document-term matrixdtm <-DocumentTermMatrix(corpus)# Convert the document-term matrix to a matrixmat <-as.matrix(dtm)# Calculate word frequenciesword_freq <-colSums(mat)# Create a word cloudwordcloud(names(word_freq), word_freq, min.freq =15, random.order =FALSE, colors =brewer.pal(8, "Dark2"))
The most used words in research paper is debit, usage, payment, households, credit and consumer.
Ireland data analysis (Interactive Line graph)
Ireland data taken from central bank of Ireland website which contains data regarding debit card and other digital usage in various county and for various categories.
In below analysis Bar plot is plotted for each month showing cumulative spend on cards in that month for various categories.
Code
#Modify the format of Month columniedata$Month <-as.Date(paste0(iedata$Month, "-01"), format="%b-%y-%d")#Assigning column to category and values to Values_toiedata2 <-pivot_longer(iedata, -Month, names_to ="Category", values_to ="Value")#Import librarylibrary(viridis)#GGplot for month and cummulative spend amount for categories.ggplot(iedata2, aes(x = Month, y = Value, fill = Category)) +geom_bar(stat ="identity", position ="stack") +scale_fill_viridis_d(option ="inferno") +theme_minimal() +labs(title ="Commulative Spend values ", x ="Month", y ="Value") +scale_x_date(date_breaks ="1 month", date_labels ="%b-%y") +theme(axis.text.x =element_text(angle =45, hjust =1))
Interactive line graph for each category
In below graph line graph is plotted for each category over time period. It shows trend in spending by card which are different in various months. Trend depends upon holidays season, festival months and like in education category mostly card used in September due to starting of academic sessions in various educational institutes.
By use of interactive graph it is easy to compare categories with each another and also to do trend analysis of individual categories.
Code
#copy data to iedata3iedata3 <-iedata2#change format of month to yyyy-mm-ddiedata3$Month <-as.Date(iedata3$Month, format ="%Y-%m-%d")num_categories <-length(unique(iedata3$Category))colors <- RColorBrewer::brewer.pal(min(num_categories, 8), "Dark2")# Create a plot with the ability to toggle categories on and offfig <-plot_ly(data = iedata2, type ='scatter', mode ='lines+markers',x =~Month, y =~Value,color =~Category, colors = colors) %>%layout(title ='Category Values Over Time',xaxis =list(title ='Month',type ='date',tickformat ='%b-%y' ),yaxis =list(title ='Value'),hovermode ='closest' )# Customize legend to toggle tracesfig <- fig %>%layout(legend =list(itemclick ="toggleothers", itemdoubleclick ="toggle"))fig
Choropleth map of Ireland (card usage value in various regions)
The Nomenclature of Territorial Units for Statistics (NUTS) were created by Eurostat in order to define territorial units for the production of regional statistics across the European Union. In 2003 the NUTS classification was established within a legal framework (Regulation (EC) No 1059/2003).
As the administrative territorial breakdown of EU Member States is the basis of the NUTS classification, changes made under the 2014 Local Government Act prompted a revision to the Irish NUTS 2 and NUTS 3 Regions.
NUTS 3 Nomenclature is used for to plot the card spend values combined in various county combined according to NUTS 3 eight regions.
It is shown in map that IE061 Dublin have most amount of spending by card as comparison to other regions.
Code
year_ref <-2021nuts3_IT <-gisco_get_nuts(year = year_ref,resolution =20, nuts_level =3,country ="Ireland") %>%select(NUTS_ID, NAME_LATN)map<-read.csv("mapnew1.csv")nuts3_IT_data <- nuts3_IT %>%left_join(map, by =c("NUTS_ID"="Nuts"))plot(nuts3_IT_data[, "Observation_Value"],breaks ="jenks",main ="Choropleth map of spending by card in different regions (in Mill.)")text(st_coordinates(nuts3_IT_data), labels = nuts3_IT_data$NUTS_ID, cex =0.8, pos =3)
Leaflet map of spending by card in Ireland
we don’t have a GeoJSON file with boundaries for Irish counties, we cannot create a choropleth in leaflet map directly.
So, by taking lat long of county of Ireland we plot overall card payment done by customer in a given year. Dublin tops the chart with 90 million Euro spend by card payment in a given time period.
Code
map<-read.csv("map.csv")map$Long <-as.numeric(map$Long)map$Lat <-as.numeric(map$Lat)# Create leaflet mapm <-leaflet(data = map) %>%addTiles() %>%addCircleMarkers(~Long, ~Lat, popup =~paste("Geographical Description: ", Geographical_Description, "<br>","Observation Value: ", Observation_Value),radius =~sqrt(Observation_Value) *5, # Adjust the size of markers based on Observation_Valuecolor ="red",fillOpacity =0.5 ) %>%addLegend(position ="bottomright", colors ="red", labels ="Observation Value",opacity =1)# Display the mapm
UK Dataset of card payment
UK spending using debit and credit cards. These are official statistics in development. Source: CHAPS, Bank of England Index: February 2020 = 100, non-seasonally adjusted, nominal prices
staple’ refers to companies that sell essential goods that households need to purchase, such as food and utilities
‘work-related’ refers to companies providing public transport or selling petrol
‘delayable’ refers to companies selling goods whose purchase could be delayed, such as clothing or furnishings
‘social’ refers to spending on travel and eating out
Code
#Changing format of Month columnukdata1$Month <-as.Date(paste0(ukdata1$Month, "-01"), format="%b-%y-%d")#Making categoriesukdata2 <-pivot_longer(ukdata1, -Month, names_to ="Category", values_to ="Value")#Establishing start and end date month columnstart_date <-min(ukdata2$Month, na.rm =TRUE)end_date <-max(ukdata2$Month, na.rm =TRUE)#Plot for category in UK ggplot(ukdata2, aes(x = Month, y = Value, color = Category, group = Category)) +geom_line() +theme_minimal() +labs(title ="Index Values Over Time", x ="Month", y ="Value") +scale_x_date(date_breaks ="1 month", date_labels ="%b %Y") +theme(axis.text.x =element_text(angle =45, hjust =1))
Staple category in UK has most stable trend means there is no seasonal effect on staple category. Consumers throughout the year spend approximate value on this category by card.
Work related category seen a increase over the years with some variations in a single year. Means consumers increasingly using card by every year in work related activities.
In covid-19 time period in March 2020 to May 2020 due to movement restrictions social category spending shows a huge decrease of card spending. After that time period it gradually increase.
Linear Regression model on UK data
Code
# Create separate linear regression models for each categorylm_models <-lapply(names(ukdata1)[2:ncol(ukdata1)], function(category) {lm(ukdata1[[category]] ~ Month, data = ukdata1, subset =!is.na(ukdata1[[category]]))})# Summarize the linear regression modelssummary_lm <-lapply(lm_models, summary)# Print the summariesfor (i inseq_along(lm_models)) {cat("Category:", names(ukdata1)[i +1], "\n")print(summary_lm[[i]])cat("\n")}
Category: Aggregate
Call:
lm(formula = ukdata1[[category]] ~ Month, data = ukdata1, subset = !is.na(ukdata1[[category]]))
Residuals:
Min 1Q Median 3Q Max
-27.997 -3.405 1.358 4.532 18.110
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.673e+02 6.271e+01 -4.263 9.91e-05 ***
Month 1.913e-02 3.304e-03 5.792 5.94e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.646 on 46 degrees of freedom
Multiple R-squared: 0.4217, Adjusted R-squared: 0.4091
F-statistic: 33.54 on 1 and 46 DF, p-value: 5.938e-07
Category: Delayable
Call:
lm(formula = ukdata1[[category]] ~ Month, data = ukdata1, subset = !is.na(ukdata1[[category]]))
Residuals:
Min 1Q Median 3Q Max
-34.889 -8.853 -0.978 8.832 42.483
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.880e+01 1.096e+02 0.628 0.533
Month 1.030e-03 5.772e-03 0.178 0.859
Residual standard error: 16.85 on 46 degrees of freedom
Multiple R-squared: 0.0006917, Adjusted R-squared: -0.02103
F-statistic: 0.03184 on 1 and 46 DF, p-value: 0.8592
Category: Social
Call:
lm(formula = ukdata1[[category]] ~ Month, data = ukdata1, subset = !is.na(ukdata1[[category]]))
Residuals:
Min 1Q Median 3Q Max
-39.748 -8.203 1.168 8.800 41.387
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.318e+02 9.938e+01 -6.358 8.41e-08 ***
Month 3.774e-02 5.236e-03 7.209 4.43e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.29 on 46 degrees of freedom
Multiple R-squared: 0.5305, Adjusted R-squared: 0.5203
F-statistic: 51.97 on 1 and 46 DF, p-value: 4.431e-09
Category: Staple
Call:
lm(formula = ukdata1[[category]] ~ Month, data = ukdata1, subset = !is.na(ukdata1[[category]]))
Residuals:
Min 1Q Median 3Q Max
-9.611 -5.320 -1.014 2.596 16.570
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -172.24798 43.65769 -3.945 0.00027 ***
Month 0.01495 0.00230 6.501 5.12e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.716 on 46 degrees of freedom
Multiple R-squared: 0.4788, Adjusted R-squared: 0.4675
F-statistic: 42.27 on 1 and 46 DF, p-value: 5.116e-08
Category: Work_Related
Call:
lm(formula = ukdata1[[category]] ~ Month, data = ukdata1, subset = !is.na(ukdata1[[category]]))
Residuals:
Min 1Q Median 3Q Max
-39.935 -7.762 1.005 12.145 26.804
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.021e+02 1.022e+02 -8.823 1.87e-11 ***
Month 5.332e-02 5.387e-03 9.898 5.64e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.73 on 46 degrees of freedom
Multiple R-squared: 0.6805, Adjusted R-squared: 0.6735
F-statistic: 97.97 on 1 and 46 DF, p-value: 5.638e-13
Overall Observations:
Month has a statistically significant positive influence on spending in all categories except “Delayable.” The strength of this association varies, with “Work_Related” showing the strongest link and “Aggregate” having the weakest.
Delayed category - In regression graph shows month has very little influence on spending in this category according to this model. The coefficient for the month variable (1.030e-03) is very small and statistically insignificant (p-value = 0.8592). This suggests there’s no statistically relevant linear relationship between month and spending in the “Delayable” category.
Social category - The multiple R-squared value (0.5305) indicates that the model explains over 53% of the variance in spending within the “Social” category. This suggests a moderately strong positive association between month and spending in this category. The month variable has a positive and statistically significant coefficient (3.774e-02; p-value < 0.001). This suggests that spending and month have a positive linear connection. Put otherwise, monthly spending in the “Social” category tends to go up a little on average.
Staple category - The coefficient for the month variable (0.01495) is positive and statistically significant (p-value < 0.001). This indicates that there’s a positive linear relationship between month and spending. In other words, on average, spending in the “Staple” category tends to increase slightly with each month. The p-value for the month variable (5.116e-08) is highly significant, meaning we can reject the null hypothesis that there’s no linear relationship between month and spending. We can confidently conclude that there’s a statistically positive association.
Work Related Category - The model explains more than 68% of the variance in spending within the “Work_Related” category, according to the multiple R-squared value (0.6805). This implies that there is a significant positive correlation between the month and this category’s spending. The coefficient for the month variable (5.332e-02) is positive and statistically significant (p-value < 0.001). This indicates that there’s a positive linear relationship between month and spending. The following could be the cause of the rise in work-related spending: Higher expenses as a result of increased business activity throughout the year. recurring costs associated with employment, such as software renewals or subscriptions, that occur all year long.
Code
# Convert 'Month' to Dateukdata1$Month <-as.Date(ukdata1$Month, format="%b %Y")# Create a list to store plotsplots <-list()# Loop through each category and create a plot with the regression linefor (category innames(ukdata1)[2:ncol(ukdata1)]) {# Fit linear regression model lm_model <-lm(ukdata1[[category]] ~as.numeric(ukdata1$Month), data = ukdata1, subset =!is.na(ukdata1[[category]]))# Extract slope and intercept coefficients slope <-coef(lm_model)[2] intercept <-coef(lm_model)[1]# Plot original data points plot <-ggplot(ukdata1, aes(x =as.numeric(Month), y = .data[[category]])) +# Convert Month to numericgeom_point() +labs(title =paste("Regression Model for", category),x ="Month", y ="Index Value") +scale_x_continuous(breaks =as.numeric(ukdata1$Month), labels =format(ukdata1$Month, "%b %Y")) +# Format Month labelstheme_minimal()# Add regression line to the plot plot <- plot +geom_smooth(method ="lm", se =FALSE,aes(y = intercept + slope *as.numeric(Month)),color ="red")# Store the plot in the list plots[[category]] <- plot}# Print the plotsfor (category innames(plots)) {print(plots[[category]])}
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
Conclusion
By analyzing card spending data from the UK and Ireland, we found that consumers in both countries primarily use cards for food and groceries.
There are significant seasonal effects on consumer spending behavior across different categories. Also effect of Covid-19 can seen seen in both countries in spending decrease in each category. For example, in Ireland, card spending for education increases in September, while card spending for clothing increases in December each year, likely due to the holiday season.
In Ireland Spending by card in high in June-July period and decreased in December month. Reason behind that is people usually travel more in summer season as compared to winter season.
In Ireland, consumers in County Dublin spend more on cards than those in County Cork and County Galway, respectively. This suggests a higher card penetration rate in Dublin’s financial system compared to other counties.
In UK consumer spend by Revolut card on Fuel significantly higher that clothing and social. Also in Uk consumers use more card for buying work related products than social and clothing. Spending in Clothing by card saw highest increase during December January period. Every year there is same trend for Clothing category.
References
Borzekowski, Ron, K Kiser Elizabeth, and Ahmed Shaista. 2008. “Consumers’ Use of Debit Cards: Patterns, Preferences, and Price Response.”Journal of Money, Credit and Banking 40 (1): 149–72.